Solving similarity joins and range queries in metric spaces with the list of twin clusters

نویسندگان

  • Rodrigo Paredes
  • Nora Reyes
چکیده

The metric space model abstracts many proximity or similarity problems, where the most frequently considered primitives are range and k-nearest neighbor search, leaving out the similarity join, an extremely important primitive. In fact, despite the great attention that this primitive has received in traditional and even multidimensional databases, little has been done for general metric databases. We solve two variants of the similarity join problem: (1) range joins: Given two sets of objects and a distance threshold r, find all the object pairs (one from each set) at distance at most r; and (2) k-closest pair joins: Find the k closest object pairs (one from each set). For this sake, we devise a new metric index, coined List of Twin Clusters (LTC), which indexes both sets jointly, instead of the natural approach of indexing one or both sets independently. Finally, we show how to use the LTC in order to solve classical range queries. Our results show significant speedups over the basic quadratic-time naive alternative for both join variants, and that the LTC is competitive with the original list of clusters when solving range queries. Furthermore, we show that our technique has a great potential for improvements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recursive Lists of Clusters: A Dynamic Data Structure for Range Queries in Metric Spaces

We introduce a novel data structure for solving the range query problem in generic metric spaces. It can be seen as a dynamic version of the List of Clusters data structure of Chávez and Navarro. Experimental results show that, with respect to range queries, it outperforms the original data structure when the database dimension is below 12. Moreover, the building process is much more efficient,...

متن کامل

A content-addressable network for similarity join in metric spaces

Similarity join is an interesting complement of the wellestablished similarity range and nearest neighbors search primitives in metric spaces. However, the quadratic computational complexity of similarity join prevents from applications on large data collections. We present MCAN, an extension of MCAN (a Content-Addressable Network for metric objects) to support similarity self join queries. The...

متن کامل

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

Similarity Join in Metric Spaces Using eD-Index

Similarity join in distance spaces constrained by the metric postulates is the necessary complement of more famous similarity range and the nearest neighbor search primitives. However, the quadratic computational complexity of similarity joins prevents from applications on large data collections. We present the eD-Index, an extension of D-index, and we study an application of the eDIndex to imp...

متن کامل

(JCLR) property and fixed point in non-Archimedean fuzzy metric spaces

The aim of the present paper is to introduce the concept of joint common limit range property ((JCLR) property) for single-valued and set-valued maps in non-Archimedean fuzzy metric spaces. We also list some examples to show the difference between (CLR) property and (JCLR) property. Further, we establish common fixed point theorems using implicit relation with integral contractive condition. Se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Discrete Algorithms

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2009